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Abstract 

With a vast number of items, web-pages, and news to choose from, online services 
and the customers both benefit tremendously from personalized recommender systems. 
Such systems however provide great opportunities for targeted advertisements, by dis¬ 
playing ads alongside genuine recommendations. We consider a biased recommendation 
system where such ads are displayed without any tags (disguised as genuine recom¬ 
mendations), rendering them indistinguishable to a single user. We ask whether it is 
possible for a small subset of collaborating users to detect such a bias. We propose an 
algorithm that can detect such a bias through statistical analysis on the collaborating 
users’ feedback. The algorithm requires only binary information indicating whether a 
user was satisfied with each of the recommended item or not. This makes the algorithm 
widely appealing to real world issues such as identification of search engine bias and 
pharmaceutical lobbying. We prove that the proposed algorithm detects the bias with 
high probability for a broad class of recommendation systems when sufficient number 
of users provide feedback on sufficient number of recommendations. We provide exten¬ 
sive simulations with real data sets and practical recommender systems, which confirm 
the trade offs in the theoretical guarantees. 


1 Introduction 

The growth of online services has provided a vast variety of choices to users. This choice 
exists today in multiple domains including e-commerce with a variety of products, and online 
entertainment (NetFlix, Pandora). With users having to choose from an overwhelming set of 
items, recommender systems have become indispensable in easing the information overload 
and search complexity. Recommender systems are not restricted to retail businesses. A 
search engine like Google can be viewed as a recommendation engine that helps users hnd 
relevant information by ranking the search results according to their search criteria, history 
and other personal information. Social networking sites like Twitter and Facebook display 
Tweets and News Feed based on users’ past behavior and their connections to other users. 
News portals like Yahoo! News also present personalized content to online news readers. 
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Personalized recommender systems serve as an attractive platform for advertisers to reach 
their targeted consumers. It is now customary to see ads alongside other genuine recommen¬ 
dations in many of the websites that provide recommendation services. One can distinguish 
these ads from genuine recommendations, for example, by the location of their placement 
or by their special tags. But recommendation engines are not legally obliged to facilitate 
such distinction and could possibly serve these ads mixed with genuine recommendations in 
a manner that renders them indistinguishable to users. Such a biased recommender system 
can have far reaching consequences, including user dissatisfaction with the recommendations 
[1]. A recent survey by Facebook shows that users hnd sponsored ads mixed with genuine 
posts in their News Feed more annoying than the explicit, well-separated ads [2]. Social 
and political consequences of bias in the context of media and online content have also been 
studied mm- 

Modern recommender systems, in general, consist of two components: (i) learn individual 
preferences from user feedback, and (ii) recommend items to users based on the estimated 
preferences. This combination of learning and recommending is bound to be noisy (the 
learning phase will explore individual preferences typically by presenting “random” recom¬ 
mendations), and several recommendations to users will likely be ineffective. Critically, both 
noise and bias manifest as bad recommendations to users. However, noise is benign and is 
a consequence of learning, while bias is systematic and is to be deprecated. Thus, a basic 
question of interest to the users of such systems is whether or not such a biased recommender 
system can be detected. This is a broad question, and detecting bias in its most general 
sense is out of the scope of this paper. We focus on a detecting a specihc type of bias where 
recommendation engines systematically favor a few items over other better or at least equally 
good items, contrary to what an objective or unbiased system would do. 

It should be noted that, with most service providers being non-transparent about their 
recommendation strategies, one cannot hope to know the exact statistical prohle of the 
recommendation engine a priori. Therefore, the key is to identify the primary features that 
can be used to differentiate between the two types without any a priori knowledge about the 
particulars of the recommendation strategy. One could, for instance, consider the average 
rating or the average number of ineffective recommendations as the performance measure and 
make a decision based on a threshold parameter. However, as we also demonstrate through 
simulations, such a basic algorithm based solely on average performance cannot distinguish 
between deliberate systematic bias and innocuous random errors. This brings us to the key 
question: Can we develop a better method to expose a biased recommender system? 

1.1 Contributions of this paper 

We say a recommendation engine is biased, if it systematically favors a small set of items over 
other items in the database irrespective of users’ preferences. On the other hand, we say that 
a recommendation engine is objective, if it satishes a simple monotonic property in its rec¬ 
ommendations to users - better suited items are given higher priority (in a statistical sense). 
The primary goal of this paper is then to develop algorithms to answer the following ques¬ 
tion: Can a meaningful distinction be drawn between objective and biased recommendation 
engines? 

BiAD Algorithm: We propose an anomaly detection algorithm that we call Binary feed- 
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back j4nomaly Detector (BiAD), which uses a statistical approach to identify a biased rec¬ 
ommendation engine. Under appropriate conditions on the size of the ad-pool, the aggres¬ 
siveness of the biased recommender system, and the number of users/samples, we show that 
BiAD correctly (with high probability) distinguishes between objective and biased recom¬ 
mendation engines. 

The algorithm leverages user collaboration, and is based on the observation that a bi¬ 
ased system is typically characterized by the occurrence of a large number of ineffective 
recommendations in a small set of items. On the contrary, giving higher priority to more 
effective items, as in an objective recommender system, precludes such concentration in a 
small set. Notably, since the users are not aware of the set of items, the BiAD algorithm 
is adaptive - as the recommender system learns users, the users “learn” the recommender 
system. Further, our algorithm relies only on binary feedback on the effectiveness of the 
recommendation. Finally, the BiAD algorithm also works for a large class of recommender 
systems since our model does not place any constraints on the recommendation engine other 
than mild statistical conditions. We hnally present extensive simulation results that cover 
various types of recommender systems and data sets to illustrate the wide applicability of 
the algorithm. 

1.2 Related Work 

Following the recent successes of the targeted advertising services, there have been several 
empirical studies that investigate the effects of displaying sponsored content alongside organic 
content HI El El. There have also been attempts to explain such effects through theoretical 
models UM- In addition, several researchers have worked on designing systems and algo¬ 
rithms from the content provider’s perspective for revenue maximization through efficient 
auction of the ad-space HDl and from the advertiser’s perspective for effectively reaching 
the target audience mm- It is empirically shown in [1] that customers are less likely to 
select recommendations which are tagged as “advertisement” or “sponsored”, motivating the 
advertisers to remove such tags. 

Prior work on anomaly detection in recommender systems exists from the perspective of 
a recommendation engine as a victim of false user-prohle injections mm- To the best of 
our knowledge, ours is the hrst work that considers the problem from the users’ perspective 
and proposes a mechanism for detection of bias in recommendation engines. 


2 System Model 

In this section, we describe our assumptions about the structural properties of objective 
and biased recommender systems by the means of a probabilistic model. This model does 
not include any particulars about the working of the recommendation engine and therefore 
typihes a broad class of recommender systems. Before we proceed to describe the model in 
detail, the salient features of this model are listed below: 

• An objective recommendation engine has a fairly good estimate of the user preferences. 
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• An objective recommendation engine follows the monotonic property - higher prefer¬ 
ence to higher ranked items. 

• A biased recommendation engine systematically gives preference to a small set of items 
irrespective of users’ tastes. 

Notation: Our notation 0,0, ©, 0 , 0 ; to describe the asymptotics of various parameters 
with increasing size of the database (total number of items in the database) is according 
to the standard Landau notation. We say that an event occurs with high probability if the 
probability of the event tends to 1 as the size of the database goes to inhnity. We use 1 {■} 
to represent the indicator function, i.e.. 


1{E}: = 


1 if event E occurs, 
0 otherwise. 


Equality and inequality between random variables always refer to almost sure (with prob¬ 
ability 1) conditions unless otherwise specihed. For example, if X and Y are two random 
variables, then X = Y implies X = Y a.s. For any given matrix, R, the row of R is 
represented by R„. 


2.1 User-Item Database 

The recommendation engine recommends products to users from a large database of m items 
indexed from 1 to m. A user’s opinion about an item is represented by a numerical value 
that we call the user’s rating of that item. It should be noted that these ratings are only 
an implicit representation of true opinions of the users - higher the rating, better suited is 
the item for the user. We denote the user-item rating matrix for the entire database by R, 
where rows indicate users and columns indicate items. We introduce a parameter called the 
efficacy threshold, denoted by rj which is used to represent opinions on a binary scale. We 
assume that a user is satished with a recommendation if the rating of the recommended item 
is greater than or equal to rj. We refer to such a recommendation and item as being effective 
for that user. 

Definition 1 (Effective & Ineffective). An itemi is effective for a useru if the rating of that 
item by the user, Rui is at least rj. Similarly, a recommendation is said to be effective for a 
user if the recommended item is effective. An item or recommendation that is not effective 
is said to be ineffective. 

Let /m(? 7 , [m]) denote the number of items in the database [m] whose rating is greater 
than or equal to r] for user u. In other words, it is the number of effective items in the 
database for user u. 

Let us dehne the function F : R x R"* —)■ R as follows: 

F(r, R„) := |{i : Rui > r}| , 

where Rui is the element of the m-length vector R^. This function is used to hnd the 
number of items whose rating exceeds value r for any player u if the ratings of all the items 
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in the database for player u is given by R^. For example, if is the row corresponding to 
player u in the rating matrix, R, then F{r], R„) is equal to fuijii [^])) fhe number of effective 
items for user u. Similarly, F{Rui, R^) gives the rank of item i for user u. Also note that for 
any given R„, F{r, R„) is a non-increasing function of r. 


2.2 Recommendation Engine 

We next describe the behavior of a recommendation engine using a probabilistic model. Let 
indicate whether item i has been recommended to user u at time t, i.e.. 


tm{t) 


1 if item i is recommended to user u at time t, 
0 otherwise. 


We make the following assumption about any recommender system: An item that has been 
recommended to a user once is not recommended to the same user again, i.e., for any user, 
u and item, i, < 1. 


2.2.1 Objective Recommendation Engine 


An objective recommendation engine is considered to consist of two components - one is 
the learning strategy which estimates the user-item rating matrix by the means of available 
feedback from users, and another is the recommendation strategy which generates recom¬ 
mendations based on the estimated user preferences. Our model does not specify the details 
of the learning strategy except requiring that the output of the strategy, that is the esti¬ 
mate of the user-item matrix, be close to the original rating matrix, R. Therefore, this model 
could be applied to a wide class of recommendation engines which estimate users’ preferences 
fairly well. Let the estimate of the rating matrix at time t be denoted by R(t) = Rui{t) ■ 
This estimate is modeled as the sum of the original rating matrix and an additive noise 
matrix whose elements are independent across users, items and time. This can be written as 
R(t) = R-l- e(t), where e(t) = [e„i(f)] is the noise matrix and eui{t) is independent of eu'i'it') 
for all M,i, F, t, t'. 

The recommendation strategy uses the estimated user-item rating matrix R(t) to make 
recommendations at time t. The following model characterizes the behavior of an objective 
recommendation strategy: 


1. Recommendations are made based on a user-item weight matrix, denoted by W(f) = 
[Wui{t)]. This is a stochastic matrix (rows sum to one), which is updated based on the 
current estimate of the rating matrix, R(f). 

2. Given the weight matrix, a user is given a recommendation by choosing an item ran¬ 
domly, independent of everything else, with weights given by the row corresponding to 
the user in the user-item weight matrix. 

3. At any time t, the weight matrix W(f) satisfies the following monotonic property: if i 
and j are two items that have not been shown to user u and the ratings are such that 
Ruiit) > Ruj{t)i then the weights satisfy Wui{t) > Wuj{t). 
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2.2.2 Biased Recommendation Engine 


A biased recommendation engine marks a small set of items, A (C [m]) from the item 
database as ads. To make a recommendation to a user, with probability 7 , independent of 
everything else, it chooses an item that has not been shown from the ad-pool, A. And with 
probability 1 — 7 , it can follow any recommendation algorithm (for example, an objective 
recommendation algorithm). We refer to 7 as the bias probability. Note that the strategy for 
showing ad items is unspecihed except that no item is shown to a user twice. In particular, 
the engine may even customize its ad recommendations according to users’ tastes. As in the 
case of the complete database, let fu{v, A) denote the number of effective ads in the ad-pool, 
A for user, u. 


2.3 Discussion of Assumptions 

Some of the assumptions in the recommender system model above are present only for ease 
of analysis. We discuss below how they can be relaxed in practical settings. 

1. It is assumed that, in any recommendation engine, an item once recommended to a 
user is not recommended to the same user again. This condition is required only to 
ensure that there are no repeated recommendations of sponsored advertisements that 
might be effective. Indeed, if all sponsored ad recommendations are effective, it would 
not be possible to distinguish them from genuine recommendations. This assumption 
can therefore be relaxed to require sufficient number of ineffective ad recommendations 
in a biased recommender system. 

2. The noise in estimation of the user-item rating matrix is assumed to be additive i.i.d. 
noise. This can be replaced by a more general noise model in which the elements of 
the estimated user-item matrix are independent across users, items and time. The 
independence assumption is used to model arbitrary errors which are unlikely to skew 
the estimated matrix in such a way as to give high preference to a small number of 
ineffective items uniformly across a large subset of users. 

3. We assume that a biased recommendation engine decides to show sponsored ads with 
probability 7 (bias probability) independent of everything else. This assumption, again, 
is used only for ease of exposition. It is sufficient to have arbitrary 7 fraction of the 
recommendations from the ad-pool, not necessarily chosen at random. 

3 Anomaly Detection Algorithm and Theoretical Re¬ 
sults 

In this section, we describe the algorithm for detecting anomalous systems and provide 
analysis of Type I and Type II errors as in binary hypothesis testing. 
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3.1 Anomaly Detection System 

The problem is to design a test to detect if a recommendation engine is biased. In other 
words, the test has to decide between the following two hypotheses: 

• Hi : “The recommendation engine is biased” and 

• Ho : “The recommendation engine is not biased.” 

It is similar to a hypothesis testing problem except that the statistical distribntion for the 
two hypotheses are not well dehned. The only a priori knowledge that is assnmed is the 
strncture of a biased recommendation engine as specihed in Section 12.21 Bnt the specihcs 
of various parameters in the recommendation engine, such as bias probability 7 and the ad- 
pool A is unknown. As in traditional hypothesis testing problems, we make use of multiple 
data points obtained from many users who constitute the anomaly detection system. The 
anomaly detection system consists of a set of n players which is a subset of the user database 
in the recommendation system. These players can give accurate binary feedback (effective 
or ineffective) on the items recommended to them. Without loss of generality, we denote 
these players as users indexed from 1 to n in the user database. 

3.2 Algorithm 

We now describe an algorithm called Binary feedback Anomaly Detector (BiAD), that uses 
the recommendations made to the players and their feedback to decide between one of the 
two hypotheses. In every round of recommendation, each player is recommended an item by 
the recommendation engine. In round t, the algorithm uses the feedback from the players 
and computes for each item, the total number of players until that round who have been 
recommended that item and found the item ineffective. This number is denoted by Bi{t) for 
item i. If the sum of the largest A{t) of these numbers among all the items is greater than or 
equal to a threshold T(t), the recommendation engine is declared to be biased. Otherwise, the 
same procedure is repeated in the next round. If the algorithm does not declare the engine 
to be biased in Q{m) rounds, then the hypothesis that the engine is biased is rejected. 
Here, A{t), T{t) and Q{m) in the algorithm are appropriately designed parameters (given 
by Equations [HIT]). The pseudocode for this algorithm is shown in Algorithm [ 1 ] 

As opposed to the basic average test, this algorithm searches for concentration of large 
number of ineffective items in a small set. Since the number of potential advertisements is 
unknown, this algorithm makes decisions in real-time as it gets feedback from the players. 
Larger the size of the ad-pool, larger is the number of feedback samples required to detect a 
biased engine. (The trade off between various parameters is discussed in detail in Section HI) 
Therefore, the algorithm increases the size of the search set with progressing rounds of 
recommendation. Also, note that the algorithm requires only binary feedback from the 
players - whether the recommendations are effective or ineffective, which explains the name 
of the algorithm. 

The following theorem gives sufficient conditions for good performance of the algorithm. 
Unlike in general hypothesis testing problems, we dehne Type I error, which corresponds to 
false positives, only for objective systems. On the other hand. Type II error is used to refer 
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Algorithm 1 5znary feedback ydnomaly Detector [BiAD) 

Initialize t = 1 (round 1). 
while t < Q{m) do 

Compute Bi{t) = number of players who have rated item i ineffective upto round t for 
all i G [m]. 

Compute S(t) = sum of the largest A(t) among {Bi(t)}^^. 
if S{t) > T{t) then 
Stop and accept Hi. 
else 

end if 
end while 

Stop and reject Hi. 


to missed detection in the case of a biased system. We do not give any guarantees for the 
class of recommendation engines that are neither objective nor biased. 

Theorem 1. Let the parameters in the detection algorithm, BiAD satisfy the following egua- 
tions: 


A{t) = t, 

T(t) = exp I 1 + W 


m 




where hh(-)0 represents the Lambert-W or product log function, and 

A{t) + c) logm 


m = 


p{t) 


- 1 , 


A{t) + c) logm 


p{t) = exp ( 1 + W 

m 

pit) 


m 


pit), 


pit) 


- 1 , 


n t 


max 


pm = E 


{.4C[m]:|.4|=i(t)} 

1 {Rui < r]} 
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( 1 ) 

( 2 ) 

(3) 

(4) 

(5) 

( 6 ) 
(7) 


with c = 1/2. Then BiAD gives the following guarantees on the error probabilities: 

(I) Type I Error: 

If the recommendation engine is objective, the probability that BiAD declares it to be 
anomalous is 0{^^). 


iPor any zGR, = z. 














(II) Type II Error: 

If the recommendation engine is anomalous with an ad-pool of size A, and if 

(a) the number of ads, A < Q{m), 

(b) the fraction of recommendations that are ads, i.e., the bias probability'-f = u , u: 

and 

(^) Y^u=i fuiVy = o{'ynA), where fu{rj,A) is the number of effective ads for user 
u, 

then the probability that BiAD does not declare the system as anomalous within A 
rounds is 

The proof of this theorem is presented in Section |6l 


4 Discussion 

In this section, we discuss how the error probabilities depend on the parameters of the 
problem. 


4.1 Choice of Threshold 


Note that computation of the threshold function, T(t) as specihed in Theorem [T] (given 
by Equation ([2])) requires knowledge of the noise statistics and also the players’ opinions 
about all the items in the database. More precisely, since R(t) = R + e(t), computation 


of E 


PM 


(see Equation ([7])) requires knowledge of R^ and also the distribution of es¬ 
timation noise, €„(/). The noise statistics reflect the accuracy of the learning strategy of 
the recommendation engine, and it is possible that these statistics are unknown or cannot 
be estimated. Moreover, it might also be difficult to obtain the players’ opinions about all 
the items in the database. To overcome this difficulty, a practical implementation of the 
algorithm could use an approximation of the unknown quantity. We now propose one way 
to compute such an approximation. Note that 


pE) 

= E 

1 - 

V 

'S 

1 _ 

/ ^ / - - \ 





< E 


E 

ieA 


F(^ri + eui{l), Ru(0 + 1 




where the inequality follows since F(^r, Ru(/) -|- €u(l)j is a non-increasing function of r. We 
assume that the estimates of the ratings are not skewed in one direction, and therefore the 
noise has zero mean. Since the noise statistics are unknown, we could approximate the right 
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hand side of the above inequality by substituting the noise term with its mean. With this 
approximation, the right hand side of the inequality can be substituted with 

- / + ! fuivAm])-l + V 

where fuiv^ [^]) is the total number of effective items in the database for user u. Depending 
on the application, it might be relatively easy to estimate this number or at least estimate 
a lower bound for this number. As an example, one could roughly estimate that for every 
user there are effective items among the m items in the database. We observe in our 
simulations that a rough estimate of /^(r/, [m]) is sufficient to obtain good results. 

Note that over-estimation (under-estimation) of T{t) decreases the probability of Type 
I (Type II) error and increases the probability of Type II (Type I) error. In other words, 
the higher the value of T{t), the lower is the probability of Type I error and the higher 
is the probability of Type II error. Therefore, the risk associated with false positives and 
missed detection could serve as a guideline for the choice of the threshold function. In our 
simulations (Section |5]), we propose a practical threshold function that gives a good balance 
between the two error probabilities for most scenarios. 


4.2 Effect of Parameters on Performance 

Theorem [1] gives guarantees on the asymptotic performance of BiAD as the size of the 
database grows large. These guarantees depend on various parameters in the algorithm as 
well as the recommendation engine. From the analytic bounds derived in Theorem [H we 
analyze in this section the trade off between these parameters to understand the conditions 
under which the algorithm shows good performance. We see that the theoretical results 
support our intuitive understanding about the conditions under which a biased system can be 
distinguished from an objective system. These results are also corroborated by our simulation 
results described in Section [5l 

In Section 14.11 we consider the effect of the choice of the threshold parameter on the error 
probabilities. We now discuss the effect of other parameters. 


Number of Rounds in the Test and Size of the Ad-Pool. It is seen (from Result (I)) 
that the upper bound on the probability of Type I error increases with increasing number 
of rounds. This is expected, since it gives more chances to falsely declare a system biased. 
For Type I error to go to zero as the size of the database goes to inhnity, it is sufficient if 
Q{m) = o{^/m). 

Guarantees for detection of a biased engine (Result (II)) are dependent on various pa¬ 
rameters. One of the conditions is that the ad-pool is not very large (Condition (II)a). 
Specihcally, it is sufficient if the size of the ad-pool, A is at most the maximum number 
of rounds of recommendations, Q{m). Therefore, increasing Q{m) (the number of rounds 
of testing) enables detection of larger ad-pools but also increases the probability of Type 
I error. Intuitively, a small ad-pool conforms with our dehnition of a biased recommenda¬ 
tion engine as one that favors a few items over many others and therefore facilitates easier 
detection. 
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Number of Effective Ads. For correct detection of a biased engine, it is also required 
that the average number of effective ads (averaged over all players) is not very large (Condi¬ 
tion (II)c). A large number of effective ads enables the recommendation system to customize 


ads according to users’ tastes and is contradictory to our interpretation of a biased system 
which recommends ads that do not match with users’ preferences. 

Number of Players. The dependence on the number of players n is seen in two respects - it 


determines the minimum bias probability at which detection is guaranteed (Condition (II)b) 


and also the probability of Type II error. Both these results show that a large number of 
players improves the prospect of correct identification which can explained by the fact that 
a large sample size supports better statistical analysis. 


Number of Effective Items. The minimum bias probability at which detection is ensured 
is also determined by the average number of effective items in the entire database. This can 
be seen from the term in (Condition (II)b). The no estimation noise case = 0 


for all u,i,t) is useful in understanding the term When there is no noise, = 
^ Y^=i 'nf=i f (r) j • Therefore, has an inverse relation with the number 

of effective items in the database. This conveys that a large number of effective items 
facilitates better detection of a biased engine. Intuitively, a large number of effective items 
in the database helps in clearer demarcation of an objective engine from a biased one. With 
many effective items in the database, an objective system would have a higher probability of 
recommending effective items, while a biased system always makes at least 7 fraction of its 
recommendations from the ad pool where the number of effective ads is limited. 


Bias Probability. The more a biased engine recommends from the ad-pool, the more 
apparent is its biased behavior. The fraction of total recommendations that are from the 
ad-pool is captured by the bias probability, 7 . We see that the probability of Type II error 
decays exponentially with increasing 7 , and also that larger 7 facilitates easier anomaly 
detection (Conditions (II)b (II)c). 


Choice of c = 1/2. In the course of the proof of Theorem [H we prove that for any 
choice of c, the Type I Error is bounded by 0{Q{m) m~^). Hence, by changing c in the 
algorithm, one can control the error probability. However, the downside of increasing c is 
that it effectively increases the threshold T{t), which results in requiring the bias probability 
7 = a;(logm(l -|- {c/A))/n). 


4.3 Applications 

The proposed anomaly detection algorithm is readily applicable in the retail market. It 
can identify recommender systems that dole out sponsored advertisements in the garb of 
personalized recommendations. In this era of personalization, there are numerous other 
applications, two of which are described below. These two examples also illustrate the 
advantage of BiAD in requiring simple binary feedback, allowing it to be applied in a wide 
variety of scenarios. 


11 









Search Engine Bias. Search engine bias is one of the most important ethical issues sur¬ 
rounding search engines, and its social implications have been studied for more than a decade 
©□Elis]. The Stanford Encyclopedia of Philosophy [TB] describes search engine bias as non- 
neutrality of search engines, where “search algorithms do not use objective criteria” or “favor 
some values/sites over others in generating their list of results for search queries.” A spon¬ 
sored search engine in the late 1990s called GoTo ranked its search results purely based 
on bids from advertisers d- It was evidently unsuccessful due to users’ mistrust of paid 
searches and was eventually acquired by Google. Google also uses an auction to sell ads but 
displays them physically separated from organic search results. 

The pros and cons of enforcing transparency in the algorithms used for generating search 
results have been examined in imiiH]. Even in the absence of total transparency, anomaly 
detection systems such as BiAD could be useful in identifying bias in search engines. With 
personalization being extended to search results Hg EHi EP, search engines virtually act 
as recommendation engines. With large number of potential search results, our model of 
recommender systems with a large database hts well in this problem where biased search 
engines correspond to biased recommender systems. In addition to search engines, this 
example can be extended to identify hidden sponsored advertisements in social networking 
sites and online news portals, all of which use personalization algorithms. 

Pharmaceutical Lobby. Pharmaceutical lobbying is another controversial issue that af¬ 
fects many parts of the world [22l |23l [2H [25]. Among its many aspects, we focus on the 
marketing practices of large pharmaceutical companies which manipulate the opinions of 
doctors, health care providers and law-makers by providing biased information and through 
other tactics [261127] . There have been allegations that big drug companies influence physi¬ 
cians to prescribe their highly priced branded drugs even when other better or cheaper 
alternatives are available [28l [29] . 

Again, our interpretation of a biased recommendation engine is well-suited to model this 
scenario. Since drugs are prescribed on a person-to-person basis, health care providers can 
be viewed as recommendation service providers who recommend drugs to patients, and the 
lobbying drug companies act as advertisers. A health care system that favors a few incom¬ 
petent (expensive or ineffective) drugs in spite of the availability of other better (cheaper 
or more effective) alternatives matches well with our dehnition of a biased recommenda¬ 
tion engine. With data samples consisting of prescriptions and their efficacy on patients, 
anomaly detection algorithms like BiAD could help watchdog agencies in identifying such 
malpractices. 

5 Numerical Results 

We evaluate our algorithm through offline simulations, with careful considerations for ensur¬ 
ing proximity to real world scenarios. 

5.1 Simulation Setup 

Given below is a detailed description of the methods we adopt in our simulations to replicate 
the different components of a recommender system. 
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User-Item Database. Estimating users’ opinions about all items in a database is essential 
for real-world recommender systems, and hence for simulating those recommender systems 
as well. However, the ground truth on such data set is not available, since in existing data 
sets each user typically only rates a small subset of items, and those ratings are also noisy 
and possibly biased. 

For a complete user-item rating matrix, we take available sparse data sets ([301 [311 [32]) 
and renormalize the ratings on a linear scale from zero to ten. The missing entries in the 
sparse matrix are then hlled in using the matrix completion algorithm from [SB]- For the 
purpose of simulating real-world user opinions, we consider this completed matrix as ground 
truth. We evaluate our algorithm on three data sets: 

D1 a subset of the Amazon cellphones and accessories data set [30] with 3671 users and 
8728 items, 

D2 a subset of the Netflix Prize data set [ST] with 2951 users and 9259 movies, and 

D3 a subset of the Movielens 10m ratings data set [S5] with 3671 users and 8729 movies. 

Recommendation Engine. Due to non-transparency of recommendation strategies, it is 
not exactly known how recommendation engines behave. As a representation of the learning 
strategies used by these systems, we use two learning algorithms popular in literature: 

LI Matrix factorization. Specihcally, we use the inexact ALM method proposed in [ST] . 

L2 User-based collaborative hltering (with Pearson correlation as the similarity metric 

H). 

To simulate the temporal dynamics of a recommender system, the recommendation engine 
is initially supplied a sparse subset of the user-item ratings chosen according to a power- 
law degree distribution observed in real-world data sets [SB]- (Specihcally, the number of 
feedback entries from each user is chosen from a pareto{3, 3) distribution.) In each round of 
recommendation, the engine recommends one item to each user and observes users’ feedback 
about the recommended items. It periodically updates its estimate of the users’ preferences 
based on this feedback. In our experiments, we set the frequency of these updates to once 
in every 5 rounds. 

It is natural for a recommendation engine to have an explore component to address 
the cold start problem and have wider coverage of the database [ST]. Therefore, in all our 
experiments we invoke random explore for 0.1 fraction of the recommendations made. In 
a recommendation meant for exploration, an item is chosen uniformly at random from the 
database, provided it has not been shown previously. 

For all other recommendations which do not explore, we use the following recommenda¬ 
tion strategy - for each user, the items are ranked according to the estimated preferences. To 
make a recommendation, an objective recommendation engine chooses the highest ranked 
item among the items not yet recommended. The recommendation strategy of a biased 
engine follows the description in Section 12 .2.21 - for any recommendation, with probability 
1 — 7 , like an objective engine, it recommends the highest ranked item and with probability 
7 , it recommends an item from the ad-pool. We consider two kinds of ad selection strategies 
- from the among the ads that have not already been recommended to the user, 

A1 An ad item is chosen uniformly at random. 

A2 The ad item which has the highest ranking is chosen. 

Strategy IA2I corresponds to customization of ads according to users’ tastes. Note that this 
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is harder to detect than strategy lAll since it has higher likelihood of recommending effective 
ads. 

Anomaly Detection System. For players in the anomaly detection system, we randomly 
choose a subset of users from the data set. In experiments which test the performance of the 
algorithm with increasing number of players, we choose the subset of players incrementally. 

As explained in Sections H] and [31 the algorithm requires feedback samples of efficacy of the 
recommendations made to the players. For our experiments, we adopt the characterization 
of efficacy used in our theoretical model given by Dehnition [1] Note that the number of 
effective items in the database for any user depends on the efficacy threshold rj. To be able 
to test the algorithm for different number of effective items, we choose a different efficacy 
threshold for each of the data sets. Specihcally, we set 7] = 5.5, 8.0, and 8.8 for data sets ID 11 
ID21 and ID31 which correspond to an average number of effective items of 80, 250, and 150 
respectively. 


5.2 Results 


We evaluate the performance of BiAD with variations in different parameters of the rec- 
ommender system and the anomaly detection system. To demonstrate its effectiveness in 
different settings, we present performance results for various combinations of data sets (EB 
ID3|) and recommendation algorithms flLlllL21 lATIIA2p . An objective recommendation engine 
is represented by its learning algorithm fILlI or IL2|) while a biased recommendation engine 
is represented by its learning algorithm fILlI or IL2|) and its ad-recommendation strategy flAll 
or IA2p . Although we present experimental results for specihc combinations for space limita¬ 
tions, other settings give similar trade offs. Specihcally, the simulation results corroborate 
our theoretical analysis of the tradeoffs between various parameters in Section 01 

We now describe how the performance depends on the choice of various parameters in 
BiAD. We set the parameter A[t) according to Equation ([1]) in all the simulations. Other 
parameters of the algorithm are discussed below. 


Threshold. As explained in Section l4Tl varying the threshold parameter T(t) in the algo¬ 
rithm affects Type I and Type II error probabilities in opposite ways. Using lower values 
of threshold increases probability of Type I error and decreases that of Type II error. The 
threshold given by Equation (0]) is designed to ensure, irrespective of the number of players 
in the anomaly detection system, low probability of false positive (Type I error) even when 
the estimated user preferences are noisy. This is especially important if the risk associated 
with false implication of an objective recommendation engine is high. 

We observe that a less conservative threshold gives a better balance between the two types 
of errors. Specifically, we use a threshold that can be proved to guarantee low error rates 
under the assumption that the recommendation engine’s estimation of user preferences are 
accurate. This threshold, denoted by T'(t), is equal to the value of p(t) given by Equation (0]). 
In all our simulations, we show the performance of BiAD for both these threshold choices. 
Simulation results show that T'{t) gives better performance than T{t) except in one case 
(Figure. [2b|) where those two choices give similar performances. 
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(a) Data set : IDll , Algorithm ; IL2I + lAll , 
n= 100 ,7 = 0.45, A = 8. 


(b) Data set : ID2I , Algorithm : + IA2I , 

n= 100 ,7 = 0.35, A = 8. 


Figure 1: Number of rounds in the test, Q{m) affects the number of ads that can be detected 
at least 8 and 15 rounds required for T'{t) and T{t) respectively. 


In both these thresholds, E[P^(/)] in Equation ([2D is substituted with the right hand 
side of ([HD. This requires knowledge of the number of effective items for each player in the 
anomaly detection system. In our simulations, BiAD approximates this with the average 
number of effective items for all the users. Effectively, it uses the following value of p{t) 
instead of Equation (ED: 


pit) = 

i=i 


nA{t) 


( 9 ) 


where f{[m]) is an estimate of the average number of effective items in the database [m]. 
We assume that this average number is not very difficult to estimate and in all the results, 
unless specihed, BiAD has an accurate estimate of this number. 


Number of Rounds in the Test. The number of rounds of recommendation Q{m) affects 
the error probability. This is seen in Figure [T] which shows the variation of sum of Type I 
and Type II errors with Q{m). Type I error rate is in fact close to zero for all values of 
Q{m) for both the thresholds, so the plots effectively show Type II error rates. Theorem [1] 
guarantees detection of a biased engine if Q{m) > A. The plots show that BiAD detects 
8 ads if Q{m) is at least 8 and 15 for T'{t) and T[t) respectively. For all the remaining 
simulations, we set the parameter Q{m) = 40. 

Number of Players. Larger number of players in the anomaly detection system indicates 
higher number of input samples to the algorithm, and as expected, the algorithm performs 
better as this number increases. In Figure [21 we plot the sum of Type I and Type II error 
rates with increasing number of users. To detect a biased engine with the specihed value of 
7 , these plots show that 70 and 100 players respectively are sufficient when T'{t) and T{t) 
are chosen to be the threshold parameter. We use 100 players in all other simulations. 
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(a) Data set : IDll Algorithm ; [E] + lAll 
A = 8,7 = 0.45. 


(b) Data set : ID21 Algorithm ; [U] + lAll , 
A = 8,7 = 0.35. 


Figure 2: The performance improves with number of collaborating users n. 



Number of Items (m) 


Figure 3: Variation of Type / + Type II error rates with size of data set. Data set ; ID21 
Algorithm ; ILll + IA^ A = 8,7 = 0.35, n = 100.. When there are more choices to recommend, 
the user satisfaction with objective recommender systems improves making detection easier. 


In addition to the choice of parameters in the algorithm, various aspects of the recom¬ 
mender system affect the performance of BiAD. These are described below. 


Size of the Database. Theorem [T] shows that BiAD performs well for recommender systems 
with large item databases. Databases of varying size are constructed by sub-sampling items 
from the original data set. Figure [3] shows the variation of Type I and Type II errors with 
the size of the database. T{t) and T'(t) have very similar performance for the parameters 
in this experiment. The plot shows that, for detection of 8 ads recommended 35 percent of 
the time, the algorithm is effective for databases of size 1500 items or larger. 

We now demonstrate that BiAD has been appropriately designed to identify biased en¬ 
gines that systematically recommend (make a sizable fraction of recommendations) from a 
small ad-pool. 
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(a) Data set : IDll , Algorithm ; ILlI + lAll , 
n= 100,7 = 0.45. 


(b) Data set : ID2I , Algorithm : + IA2I , 

n= 100,7 = 0.35. 


Figure 4: As the size of the ad-pool A increases, the (personalized) ads become similar to 
effective recommendations, making it hard to detect (Type II error is large). 


Size of the Ad-Pool. Theorem [T] shows that BiAD guarantees detection of a biased engine 
that has a small ad-pool. This same effect is also observed in simulations - Figure H] shows 
rate of missed detection (Type II error rate) with varying size of the ad-pool. It is seen that 
both the thresholds perform well for small number of ads, while threshold T'{t) can detect 
an ad-pool of size upto 25. 

Bias Probability. The bias probability 7 quantihes the intensity of bias of the recommen¬ 
dation engine. Plots (Figure E]) for Type II error rate with 7 show that more biased (higher 
7 ) engines are easier to detect. 

Estimate of the Number of Effective Items. In all the simulations above, it is assumed 
that BiAD has an accurate estimate of the average number of effective items (/([m])) which 
is used to determine the threshold parameter in the algorithm (See Equation (|9])). Note 
that overestimation of this parameter lowers the threshold parameter thereby increasing the 
probability of Type I error and decreasing the probability of Type II error. Figure [ 6 ] shows 
the effect of variations in this estimate for data set ID3I which has an average of 150 effective 
items. We observe that T'{t) performs well for a wide range of estimates. In the case of 
T{t), it is safer to overestimate the parameter /([m]) than to underestimate it. 

5.3 Ineffectiveness of Basic Average Test 

As explained in the introduction (Section [T]), we demonstrate the inability of the basic 
average test to distinguish between random errors and deliberate promotion of ads. This 
test computes the average rating across all recommendations and decides between the two 
hypotheses based on a threshold parameter. With the specihcs of the recommendation 
strategy (explore probability) unknown, it is difficult to estimate the right value of threshold. 
For an explore probability of 0.1, Figure 0 shows the performance of the basic average test 
for different values of the threshold, denoted r. It is seen that threshold values around 3 give 
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(a) Data set : ID2I , Algorithm ; ILII + IA2I , 
n = 100, A = 8. 


(b) Data set : ID3I , Algorithm : |n] + IA2I , 
n= 100, A = 8. 


Figure 5: Type II error rate decreases as bias probability 7 increases. 



Figure 6 : Variation of Type / + Type II error rates with perturbations in the algorithm’s 
estimate of the average number of effective items f{[m]). Data set : ID31 Algorithm ; ILII 4- 
[AB A = 8,7 = 0.4, n = 100. 
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Figure 7: Variation of Type / and Type II error rates with threshold r for the basic average 
test shows that the naive approach is sensitive to the choice of parameter r. Data set : IDll 
Algorithm : ILII + lATl A = 8,7 = 0.45, n = 100, Explore Probability = 0.1. 



Figure 8 : Variation of Type / error rates with variation in explore probability shows that 
the threshold for the basic average test is sensitive the value of explore probability. Data set 
: IDll Algorithm : ILTI n = 100. 


the best performance. But, as shown in Figure El this same threshold value fails for other 
values of explore probability. For example, the basic average test falsely declares an objective 
recommendation engine with 20 percent explore probability as biased. This shows that the 
correct choice of r is sensitive to the explore probability. In contrast, note that BiAD has 
nearly zero Type I error rate for all values of explore probabilities. 


6 Proofs 

Before we provide the proof of the main theorem, we state and prove two lemmas used in 
proving the main theorem. 


Lemma 2. For any objective recommendation algorithm and for any user u, item i, and 
time t < {F{Rui{t), Ilu{t)) + 1), the probability that item i is recommended to user u at time 
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t is upper bounded by 


1 


Wu^{t) < 


F (B„i(i),R„(i)) - t+l 


( 10 ) 


Proof. 




i=i 

m 


m / t—1 \ 

> ^ 1 [Pujit) > ^ni(t)} 1-5^ l-(0 Wujit) 


i=i 

m 


1=1 

t-1 


(b) ™ \ 

.7 = 1 V l=l J 


t-1 


j ■ Ruj{t) > Rui{t), ^ lnj(0 = 0 


1=1 


Wm{t) 


> 


j : Ruj{t) > R ■u 


t-1 

j ■ ¥" 0 

7=1 


= [F {Ruiit), Ru{t)) - {t-1)) Wui{t), 


Wui{t) 


where the (a) and {b) follow from the characterization of an objective recommendation al¬ 
gorithm in Section 12.2.11 - equality (a) due to the fact that W {t) is a stochastic matrix 
(Property [T]), and inequality (b) due to the monotonic property satisfied by the weight matrix 
(Property [3]). The above inequality gives the desired bound in ffTOj) for t < {F{Rui{t), R,i(f))-|- 
!)• 

Lemma 3. Let {X,, i = 1,..., /c} be independent Bernoulli random variables with mean 
{pi, i = 1,..., k}, and let Yl\=i Pi — P- Then, 

k 1 . 


P 




i=l 


< exp ( —Tlog J + T — p ] WT > p. 


Using Chernoff bound for independent random variables, we have, for any 6 > 0, 


P 


J2x,>t 


2=1 


< JJe 

2=1 

= JJ [piC^ + l- pi) 

2 = 1 

+ Pi) 


2=1 


< 


e-(|(.«-l) + l)‘. 
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where the second inequality follows from the fact that the geometric mean of non-negative 
numbers is at most their arithmetic mean, and the last inequality follows for 6 > 0, which is 
true for the choice of 6^ = log {{k — p)T / {p{k — T))) for any T > p. Then, we get 


P 




> T 


2=1 


< 


< 


p{k - T) 
{k-p)T 


p V 

t) 

p V 

t) 

py 

t) 


= exp 


k — p 
k-T 

1 + ^ 

k-T 

.T-P 


-Tlog( - 

p 


{k-p)T 1 P 
{k-T)k k 

k-T 


k-T 


+ T-P 


□ 


Proof of Theorem [II The proof of the theorem consists of two parts which give upper bounds 
for probability of Type I and Type II errors. 


Type I Error 


The algorithm makes a Type / error if it declares an objective recommendation engine to be 
biased. This section shows that the probability that the algorithm makes Type I Error is 

low. 

Suppose that the recommendation engine uses an objective recommendation algorithm. 
We first bound the probability that BiAD accepts Hi in round t. Recall that the algorithm 
accepts Hi in round t if S{t) > T(t), and that 

S{t) = max Biit). 

{Tc [m]: |T|=d(pI 

Consider a fixed A C [m] such that |^| = A{t). We first bound the probability that 
> T{t) and then use union bound over all possible A to obtain an upper bound 
on the probability that S{t) > T{t). In this direction, we define 

XAl) ■.= '^tui{l)-t{Rui<v} . ( 11 ) 

isT 


Note that Xu{l) is equal to 1 if in round I, player u is recommended some item from 


{w(0,R(i), 


I e 


we have that 


set A that is not effective and is 0 otherwise. Given 

{Xu{l), / G [t], M G [n]} are Bernoulli random variables independent of each other. For every 
1 < M < n and 1 < / < t, the mean of Xu{l) can be bounded as follows: 

t 


E 


XAI) 


{w(r),R(/') 


v=i 


i&A 

< pAo, 


( 12 ) 
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where the inequality follows from the upper bound for Wui{l) from Lemma [2] and the defini¬ 
tion of -P^(0 (dB)- From ([2]), we have 


<p(t) 


u=l 1=1 


Note that, since the noise in the rating estimates are independent across time and users, 
I e [t], u E [n]| are independent random variables. We now use the Chernoff bound 
in Lemma [3] to obtain a probabilistic upper bound on their sum. 


P 


n t 


E E P(') s p(«) 


_ii=i ;=i 

By the definition of Lambert-W function, 

'pit) 


< exp -Pit) log 


P{t) 

pit) 


pit) 


Pit) {log 


which further implies that 

n t 


pit) 


1 = [Ait) c logm - pit), 


P 


E E Ptw > p(t) 


U=1 1 = 1 


< exp (— (Ait) -I- c) logm) . 


(13) 


We now proceed to obtain a probabilistic upper bound for the sum, YAi=i Y^i=i^uil)- 
By inequality flT^ . the sum of the corresponding means has the following upper bound: 


n t 


u=l 1=1 


n t 


EE® w(/) {w(o,R(/')}_,_^ <eEPw- 


u=l 1=1 


Since {X^il), I E [f], u E [n]} are independent Bernoulli random variables given |w(/),R(/), / E 
Lemma [3] can be used as before. If YAA=i ^ui^) — P[^)i then Lemma [3] gives 


P 


n t 


EEw(i)>r(i) 


u=l 1=1 




i'=i 


< exp ( — ( Ait) + c) log m 


(14) 


The above upper bound can be derived in exactly the same manner as the upper bound in 
([13]) • Combining this upper bound with inequality flT^ . we have 
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n t 


EEwW>r(() 


U=1 1 = 1 


n t 


n t 


< p E E w(o > m E E k(') < p(() 
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( 16 ) 
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Now, recall that Bi{t) is the number of players who have rated item i ineffective upto round 
t, which can be mathematically written as Bi{t) = Sz=i ^ {Rui < v} ■ Using 

dehnition (ITT]) , we have 

n t 

i&A 

which gives us the following equivalent form of inequality (]T5l) : 


P 


J 2 m)>T{t) 


ieA 


< 2exp (— [A{t) + c) logm 


(16) 


We can now take a union bound over all possible A to bound the probability that BiAD 
accepts Hi in round t. 


P [S{t) > T{t)] = P 


U lE^'Wa^w 

{^c [m]:|^|=A(t)j- 


< E "• 

{AC [m]: 1.41 =d(t) j- 

m 


J2m)>T{t) 


ieA 


< 2 exp (—clogm) 
= 2m-T 


where the second inequality follows from (IT^ . Further taking a union bound over all rounds, 

P[Type / Error] = P [u2r^F(f) > T{t) 

Q(m) 

< Ep|S(‘)sJ’W1 

t=l 

< 2 Q{m) m~^. 

This shows that BiAD declares an objective recommendation engine as biased with proba¬ 
bility 0(^@) for the choice of c = 1/2. 


Type II Error 

The algorithm makes a Type II error if it does not detect an biased recommendation engine, 
i.e., it declares Hq when Hi is true. Suppose that the recommendation engine is biased with 
an ad-pool A of size |^| = A. Fix a 5 G (0,1). We prove that X]u=i Xlzli Yhi&A which 

is equal to the number of ad recommendations to the n players until round A is at least 7(1 — 
5)nA with high probability. For any 1 < m < n, let Fu(0 = 1 if fhe biased recommendation 
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engine decides to recommend from the ad-pool to user u in round t. Note that ^ mil) > 
Yu{l) and that {Yu{l), I G [A], u G [n]} are i.i.d. Bernoulli random variables with mean 7 . 
This gives us 


P 


n A 

lm(0 < 7(1 - ^)nA 

_u=l 1=1 i^A 


< P 


< e 


n A 


EE Yuil) < 7(1 - 


U=1 l=l 
S^jnA 


(17) 


where the last inequality follows from a version of Chernoff bound given in [3H] for sum of 
i.i.d. Bernoulli random variables. 

The detection algorithm makes the correct decision if in round t, S{t) > Tit) for some 
t < Qim). We show that SiA) > TiA) with high probability. Since A < Qim), the 
algorithm makes the correct decision with high probability. Now, suppose that the number 
of ad recommendations to the n players until round A is at least 7(1 — 5)nA. Since the 
total number of effective ads to the n players is 0 ( 777 .^ 4 ), the total number of ineffective 
recommendations from the ad-pool until round A is 'yVlinA). Consequently, 

SiA) = max Biit) 

> E 

ieA 

> 7(1 — 6)nA — oi'jnA) 

= '^VtinA). (18) 


To prove that T(A) does not exceed the right hand side of inequality flT^ . we consider the 
following cases: 

(i): /3(7l) > e 

Since hT(-) is an increasing function in [0, cxo), we have W > hT(l) > Now, 



< 


iA + c) log m 

liT) 

logm 


n 


OinA), 


where the second equality follows from the dehnition of the Lambert W function, and 
the last inequality follows by using the dehnition of A A) given by ([ 3 ]). 
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(ii): /3(A) < e, /3(A) > e 


p{A) = exp ( 1 + ly ( ^ ) ) p(A) 
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/3(A) 


■p(A) 
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(A + c) log m 


W(l) 


T{A) = exp 1 + W' 


\ ^ 

< exp (1 + ly( 1 )) p(A) 
logm 
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- 0 {nA). 


(m\ 


p{A) 


(iii): /3(A) < e, /3(A) < e 


r(A) = exp (1 + ly j 1 P(A) 

< exp (1 + ly(1)) p(A) 

< exp(2 + 2iy(l))p(A) 
p{A) 


nA 


- 0 {nA). 


Since 'j = uj combining the results from the above three cases with inequal¬ 

ity flTB]) gives that «S'(A) > T(A) for m large enough. Therefore, the algorithm declares the 
correct hypothesis in round A if the number of ad recommendations to the n players until 
round A is at least 7(1 — 5 )nA. We can therefore use the concentration inequality in (ITT)) to 
bound the probability of Type II error. 


P[Type II Error] < P [^(A) < T(A)] 

n A 

< P 


EEE lm(/) < 7(1 - ^)nA 

«=l 1=1 isA 

<52 


< exp ^—— 777 .A 


This shows that the probability of Type II error decays exponentially with the number of 
players and the bias probability. □ 
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7 Conclusion 


We propose an algorithm that can identify an biased recommendation engine that system¬ 
atically favors a few sponsored advertisements over other genuine recommendations. We 
formulate a probabilistic model for recommender systems and give theoretical guarantees for 
our detection algorithm based on this model. Specihcally, we show that the probability of 
missed detection and false positives are low for recommender systems with large databases. 
We show through simulations that the algorithm performs well for many data sets and 
different types of recommendation algorithms. In an age when both personalization and 
advertising have become very prevalent, this kind of anomaly detection algorithm is relevant 
in a wide variety of scenarios. We demonstrate how our detection algorithm can be applied 
to problems such as identihcation of search engine bias and pharmaceutical lobbying. It 
would be interesting to investigate ways of deploying such an anomaly detection mechanism 
in practical settings. 
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